Application of a semivariogram based on a deep neural network to Ordinary Kriging interpolation of elevation data

The Ordinary Kriging method is a common spatial interpolation algorithm in geostatistics. Because the semivariogram required for kriging interpolation greatly influences this process, optimal fitting of the semivariogram is of major significance for improving the theoretical accuracy of spatial interpolation. A deep neural network is a machine learning algorithm that can, in principle, be applied to any function, including a semivariogram. Accordingly, a novel spatial interpolation method based on a deep neural network and Ordinary Kriging was proposed in this research, and elevation data were used as a case study. Compared with the semivariogram fitted by the traditional exponential model, spherical model, and Gaussian model, the kriging variance in the proposed method is smaller, which means that the interpolation results are closer to the theoretical results of Ordinary Kriging interpolation. At the same time, this research can simplify processes for a variety of semivariogram analyses.


Introduction
Of the many available Kriging methods for spatial interpolation, Ordinary Kriging (OK) [1][2][3] is considered to be the best linear unbiased estimation method. The fitting effect of the semivariogram [1,4,5] of the OK method has a significant impact on the interpolation accuracy [6]. The commonly used semivariogram models include the Gaussian, spherical, and exponential models [7]. In OK interpolation, the optimal selection is generally obtained by comparing many semivariogram models [8][9][10][11][12], and then the optimal model is obtained by manual parameter adjustment. To improve the goodness of fit of the semivariogram model, the least square method [13][14][15], the maximum likelihood method [16] and the maximum-minimum method [17] can be used to fit the semivariogram function. At present, there are two problems related to semivariogram fitting methods. First, it is necessary to manually analyze and compare the semivariogram to select the appropriate semivariogram model. Second, the least square and maximum likelihood methods for improving the goodness of fit of the semivariogram are limited by the selected semivariogram model. Jo H. and Pyrcz MJ. [18] proposed the use of a convolutional neural network (CNN) to fit the semivariogram. Although this approach realizes automatic fitting and improves the goodness of fit, it relies on a large number of samples and is not suitable for small sample set. To reduce the cumbersome process of manually selecting the semivariogram model, improve the goodness of fit of the semivariogram model, a deep neural network (DNN) was proposed to simulate the semivariogram for interpolation. Theoretically, the DNN can effectively fit any function [19,20], including semivariograms. Because multiple learning layers are defined, and multiple neurons are defined in each learning layer, the DNN is obviously more effective than the ANN in fitting the semivariogram. The most important condition that the semivariogram must satisfy is the first law of geography [21][22][23]. This law shows that the correlation between similar things is stronger than that between dissimilar things, which can be satisfied by the semivariogram simulated by the DNN; that is, the semivariogram simulated by the selected sample points must be an increasing function or an approximately increasing function. By calculating the gradient of the function simulated by the DNN, we can determine the increase and decrease of the variation function at the sample point. Based on the smoothing effect, the traditional semivariogram [24,25] is unable to fully reflect the real situation, while the semivariogram fitted by the DNN can accurately describe the relationship between things. In this study, a DNN was used to simulate the semivariogram, and the OK interpolation results were compared with the results of the exponential and Gaussian models to verify the effectiveness and rationality of the proposed method.

Fitting of semivariograms
In this study, Keras of TensorFlow based on Python was used to establish a DNN model for semivariogram fitting. The experimental data were obtained from http://www.gscloud.cn/ (52.635N, 58.748W). Elevation data covering an area of 50 pixels by 50 pixels were selected for testing. Elevation data for 100 points were obtained by uniformly sampling the dataset. A dataset of 2500 pixels was obtained by interpolation and verified by comparison with the original dataset. An isotropic random field was assumed for the dataset used in this study. The r (Eq 1) and h (Eq 1) values of 20 data points were obtained by uniform sampling of 100 data points and used to fit the semivariogram model m (Eq 1). h (Eq 1) represents the distance between any two sample points, and r (Eq 1) represents half of the square of the difference between the values of any two sample points. y (Eq 1) is the gradient, which in this study, refers to the increase or decrease in (h,r). The increase or decrease in the semivariogram is determined by y (Eq 1).
The process of fitting the semivariogram model m (Eq 1) is as follows: 1. Set the DNN training parameters: Set 15 fully connected layers and set the activation function to Relu. The input and output layer activation function is Softplus. The input and output layers have weights of 1. The setting initialization scheme for the other weights is henormal. Set the input and output layer neurons to 1 (to ensure that the input and output are one-dimensional data). The optimizer is Adam, the learning rate is 0.001, the loss function is the MSE, and the evaluation function is the accuracy and the MSE.
2. The semivariogram must obey the first law of geography, so it is limited to an incremental function. The gradient of a sample point determines the increase or decrease at that point. Set h (input layer) and r (output layer) as the DNN training data. During model training, each training occurs 1 times, and the gradient value y of the model (m in Eq 1) is calculated to determine if y has a negative value. When the gradient value y (Eq 1) appears negative, the model for which the last gradient value y is all positive is selected (to ensure that the semivariogram is the best fit and an increasing function). Fig 1 shows the effect of training 10, 50, 100, and 150 times, including the gradual approach of the curve to the true distribution of the data.
3. The model with k-fold cross-validation [26,27] distinguishes the training data set and validation data set. The data set of the fitting semivariogram is divided into 10 parts (k = 10), with a total of 10 training iterations. In every training, one part is used as the validation data set, and the other 9 parts are used as the training set to obtain as much effective information as possible in the limited data and optimize the weight of the model.
Compared with ANNs, DNNs have more fully connected layers. The influence of the number of fully connected layers on the training times is shown in Fig 2. Thus, multiple fully connected layers can accelerate the fitting speed of the semivariogram. The rationality of the semivariogram fitted by the DNN was verified by comparison against that of the semivariograms fitted using the spherical, exponential, and Gaussian models. Based on the test of many different DEM data, the following conclusions were obtained for the parameter selection of the initial DNN model based on the proposed method:  1. If the learning rate is too low, the fitting accuracy will be improved, but the training speed will be reduced.
2. As the number of fully connected layers increases, the ease of semivariogram fitting increases, and the number and length of the training times decrease. However, the training time will not be reduced when there are more fully connected layers to a certain extent. The relationship between the number of fully connected layers and the training time will vary according to different data.
3. The activation function is Relu. Relu increases the neuron fitting speed but is prone to neuron death, and the weight cannot be updated. Therefore, the input layer and output layer activation function is set to SoftMax. If all activation functions are Softplus, the effect can be achieved, but the training speed will be very slow. Linear, Sigmoid and Tanh activation functions are not suitable for nonlinear regression according to the algorithm principle.
4. The weight of the input and output layers is initialized to 1 (to prevent gradient disappearance caused by Relu), and the other initialization schemes are he-normal, which can increase the training speed. When the weight is initialized to 0, the training speed is particularly slow. Lecun and Glorot initialization schemes may cause gradient disappearance or gradient explosion, resulting in under-fitting.
5. The optimizer is Adam in the proposed method. Adam is an improvement on RMSprop and Momentum. Adam shows good effects and a short training time. SGD needs to rely on minimal learning rates to achieve good results, and the training time is long.
6. Common loss functions such as the mean square error, mean absolute error, mean squared logarithmic error, and mean absolute percentage error functions produce small differences in this research.
7. To improve the model accuracy, the accuracy and the mean absolute error are used as the evaluation function.

OK interpolation
Kriging is one of the main techniques used in geostatistics. A spatial correlation range is determined, and interpolation is performed between data points based on the sampling points within this range. A semivariogram is used to perform unbiased and optimal estimation of attribute values within a limited range and in conjunction with correlation analysis to spatially correlate variables within a specified range. Therefore, kriging produces linear unbiased and optimal estimates for unknown data points based on real data for variables within a specified range and the structural characteristics of a semivariogram. The most significant advantage kriging offers over other interpolation methods is the minimization of the calculation variance.
Cressie [1] provided the derivation of the kriging method. When the sum of the weights (λ) of sample points to unknown points in the kriging method is 1, the method is Ordinary Kriging. The OK interpolation process of a DNN-simulated semivariogram is given as follows: 1. According to the neighborhood search strategy, n known sample points are selected to simulate the semivariogram using the DNN. First, the r (Eq 2) and h (Eq 3) values of any two points z i and z j are calculated, where h represents the distance between any two sample points, and r represents half of the square of the difference between the values of any two sample points. x (Eq 3) and y (Eq 3) represent the coordinates of the sample points. Then, h (input of DNN model) and r (output of DNN model) are introduced into the DNN model for training. Specific parameters are set according to the previous chapter until the model reaches the optimal fit.
h ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi 2. The distance between an unknown point and the known sample point is substituted into the DNN model to obtain the value of the semivariogram. m (Eq 4) represents the DNN model, h (Eq 4) represents the distance between unknown points and n known sample points, and r (Eq 4) represents the correlation between unknown points and known sample points (semivariogram value).
3. Calculate the weight λ(Eq 5) of known sample points to unknown sample points using OK interpolation. K (Eq 5) is the semivariogram value of any two sample points calculated according to Eq 2, and D (Eq 5) is the semivariogram value of the calculated unknown point and the known sample point (obtained according to Eq 4).
4. Substitute λ (Eq 5) from the previous step into Eq 7 to calculate the value of the unknown point. Z 0 (Eq 7) represents the value of the unknown point, and Z i (Eq 7) represents the value of n known sample points.
5. The kriging variance, which is calculated according to Eq 8, indicates the deviation between the interpolation results and the theoretical results. Z 0 (Eq 8) represents the calculated real value,Ẑ 0 (Eq 8) represents the calculated theoretical value, and varðẐ 0 À Z 0 Þ (Eq 8) represents the degree of deviation between the calculated and theoretical values.

Comparative analysis of the results
In this study, to verify the rationality of the semivariogram fitted by the DNN, the interpolation results of the fitted semivariogram were compared with those of the Gaussian and exponential models. Before OK interpolation, the normal distribution verification must be carried out for the uniformly selected samples in the four regions. The P-value is obtained through the Kolmogorov Smirnov test. When the P-value is greater than 0.05, the sample set is considered to belong to a normal distribution and is suitable for OK interpolation (Fig 4).
In the data set needed to simulate the different semivariogram models, the 20% of data set are evenly selected as the test set to test the fitting effect of the different semivariogram models, and the remaining 80% of data set are used to fit the different semivariogram models. When the Gaussian model and exponential model are used to fit the semivariogram, the least squares method is used to fit the 80% of data set. When DNN is used to fit the semivariogram, 80% of data set adopt k-fold cross-validation (k = 10) to adjust the weight of the DNN model. Fig 5  shows the fitting effect of different semivariogram models in four areas.
R 2 represents the regression coefficient, which is calculated using the r2_score of sklearn Library in Python. R 2 reflects the proportion of all variations of dependent variables that can be explained by independent variables through regression relationships. The larger R 2 is, the better the fitting effect of the semivariogram is. Table 1 shows the R 2 obtained by fitting the semivariogram with different methods with the 80% of data set in the four areas. Table 2 shows the R 2 obtained by bringing the 20% of data set as test set into different semivariogram models in the four areas. Tables 1 and 2 show that the semivariogram of the DNN simulation has achieved a good fitting effect.
The kriging variance can be used as a standard to evaluate the interpolation results by indicating the degree of deviation from the theoretical results. Therefore, when OK interpolation is performed in unknown areas, the small kriging square difference is helpful to evaluate the interpolation effect. Table 3 shows the mean theoretical kriging variance of the four regions. DNN-OK refers to OK interpolation using the DNN to simulate the semivariogram, Gaussian-OK refers to OK interpolation using the Gaussian model to simulate the semivariogram, and Exponential-OK refers to OK interpolation using the exponential model to simulate the semivariogram. Table 3 shows that the semivariogram simulated by the DNN produces a small kriging variance. Tables 4 and 5 The following conclusions can be drawn from Tables 1-5, Figs 5 and 6: 1. The DNN-simulated semivariogram can replace the semivariograms based on the exponential and Gaussian models.
2. The interpolation results of the DNN-simulated semivariogram improve the accuracy of OK interpolation and reduce the theoretical kriging variance.  The causes of errors are as follows: 1. Different semivariogram models produce different goodness of fit values, which affects the interpolation accuracy.
2. OK interpolation is carried out on the premise of a standard normal distribution, but the samples are only approximately normally distributed. Generally, if the P-value is greater than 0.05, a distribution is considered to be normal, which is one of the reasons for the interpolation accuracy.

Conclusion
In principle, a DNN can be used to fit all functions; thus, a novel method for fitting semivariograms using a DNN was proposed in this study. The following conclusions were drawn from the study results. 1. A semivariogram is mainly used to reflect spatial correlation. In traditional OK interpolation, various semivariograms (Gaussian model, exponential model, spherical model, etc.) must be analyzed, and the best fitting semivariogram must be selected to reflect spatial correlation. However, these semivariograms have specific models and cannot summarize all geographical distributions. A DNN can theoretically fit all functions. Under the premise of the first law of geography, this capability means that the best fitting semivariogram can be obtained, reflecting the optimal spatial correlation. The advantage of this method is that it can replace most other semivariogram functions and simplify the process of analyzing various semivariogram functions.
2. The kriging variance, which can be used as a criterion to evaluate the interpolation results, represents the degree of deviation from the theoretical results. Therefore, in the case of an unknown spatial distribution, a small kriging variance is helpful to evaluate the kriging interpolation effect. The experimental results show that the best semivariogram fitting has the smallest kriging variance, which means that the results calculated by this method are closer to the theoretical interpolation results. The proposed method optimizes the OK algorithm.
3. Compared with the traditional semivariogram model, the semivariogram based on DNN fitting requires more computing time, so the GPU performance must be improved to save time. However, this approach can save artificial analysis time compared to traditional artificial selection methods for semivariogram analysis.